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Abstract 

One crucial aspect of CLIL-based foreign language learning in instructional set¬ 
tings is vocabulary growth. As a consequence, research should be interested in 
how CLIL fosters vocabulary learning. Noticing an apparent shortage of data-dri¬ 
ven quantitative research on vocabulary growth in this field of CLIL is, therefo¬ 
re, problematic. The present paper reports findings from a mixed-methods study 
of vocabulary growth in an Austrian lower secondary school CLIL setting, with 
English as the language of instruction and learning. The aim of the study was to 
analyse how the use of CLIL in the English classroom could benefit learners in their 
acquisition of vocabulary in the target language. First, a repeated-measure-design 
with experimental and control groups assessed receptive vocabulary growth by 
means of a standardized vocabulary size test. Second, students’ questionnaire data 
as well as vocabulary profiling of the CLIL teachers' linguistic input explored pos¬ 
sible covariates for the vocabulary test scores. We found that CLIL-related effects 
were only co-determined by input frequency, while extra-mural factors did not 
play any role in this study. As a consequence, overly optimistic expectations re¬ 
garding the linguistic impact of CLIL in a mixed-ability setting guided by a predo¬ 
minantly implicit language teaching approach need to be re-evaluated critically. 
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Entre mas mejor: una revision del incremento del 
vocabulario en un contexto AICLE de bachillerato 


Resumen 

Un aspecto relevante del aprendizaje de la lengua extranjera basado en AICLE en 
contextos instruccionales es el incremento del vocabulario. En consecuencia, la 
investigacion deberia interesarse en la promocion del aprendizaje de vocabula¬ 
rio en AICLE. Es alarmante observar una aparente falta de investigacion del in¬ 
cremento del vocabulario basada en datos cuantitativos. El presente articulo da 
cuenta de los hallazgos de un estudio mixto del incremento del vocabulario en un 
colegio austriaco de bachillerato de educacion basica en un contexto AICLE, en el 
cual se emplea el ingles como lengua de instruccion y de aprendizaje. El objetivo 
de este estudio fue analizar como el uso de AICLE en el salon de clase pudo bene- 
frciar a los aprendices en cuanto a la adquisicion de vocabulario en lengua extran¬ 
jera. En primer lugar, a traves de repetidas mediciones con el grupo experimental 
y de control, se aplico una prueba estandarizada del tamano de vocabulario. Lue- 
go, se empleo un cuestionario de datos de los estudiantes, y mediante un perfrl 
de vocabulario del aporte linguistico de los profesores de AICLE, se exploraron las 
posibles co-variables de las pruebas de vocabulario. Se encontro que los efectos re- 
lacionados con AICLE estuvieron co-determinados por la frecuencia, mientras que 
los factores extramurales no tuvieron un rol preponderante en este estudio. En 
consecuencia, las expectativas demasiado optimistas relacionadas con el impacto 
de AICLE en un contexto de habilidades mixtas guiadas predominantemente por 
un enfoque de ensenanza implicita de la lengua deben reevaluarse seriamente. 

Palabras clave: AICLE; incremento del vocabulario receptivo; prueba del tama- 
no del vocabulario; factores extramurales. 
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Quanto mais melhor: uma revisao do aumento 
do vocabulario num contexto de Aprendizagem Integrada 
de Conteudos e de Lingua de ensino medio 


Resumo 

Um aspecto relevante da aprendizagem de lingua estrangeira baseada na Apren¬ 
dizagem Integrada de Conteudos e de Lingua (AICL) em contextos institucionais 
e o aumento do vocabulario. Por isso, a pesquisa deveria estar mais voltada a sua 
promocao. E alarmante observar uma aparente falta de pesquisa sobre esse tema 
em dados quantitativos. Este artigo da conta dos achados de um estudo misto do 
aumento do vocabulario num colegio austriaco de ensino medio num contexto 
de AICL, no qual se emprega o ingles como lingua de instrucao e de aprendiza¬ 
gem. O objetivo deste artigo e analisar como o uso da AICL na sala de aula pode 
beneficiar os aprendizes quanto a aquisicao de vocabulario em lingua estrangei¬ 
ra. Em primeiro lugar, por meio de repetidas medicoes com o grupo experimen¬ 
tal e de controle, aplicou-se um teste para averiguar o tamanho de vocabulario. 
Em seguida, empregou-se um questionario de dados dos estudantes e, mediante 
um perfil de vocabulario da contribuicao linguistica dos professores de AICL, ex- 
ploraram-se as possiveis covariaveis das provas de vocabulario. Constatou-se que 
os efeitos relacionados com a AICL num contexto de habilidades mistas guiadas 
predominantemente por um enfoque de ensino explicito da lingua devem ser re- 
avaliadas seriamente. 

Palavras-chave: AICL, aumento do vocabulario receptivo, teste de tamanho do 
vocabulario, fatores extramurais. 
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INTRODUCTION 

CLIL is "an educational approach where curricular content is taught through 
the medium of a foreign language, typically to students participating in 
some form of mainstream education at the primary, secondary, or tertiary 
level” (Dalton-Puffer, 2011, p. 183). One of its major premises holds that pro¬ 
viding rich amounts of foreign language input in a mostly immersive con¬ 
text will lead to higher proficiency in the target language (Dalton-Puffer, 
2011; Perez-Canado, 2011). However, some critical voices have been raised 
lately, see Bruton (2011) and Paran (2013). 

This criticism became the driving force for the present study It in¬ 
vestigated CLIL-based vocabulary growth in lower secondary Austrian 
students. This sample arguably differed from more typical European CLIL 
contexts concerning student selection and target language contact time. 
The students in this CLIL research project, for example, worked within a 
non-selective, sub-optimal (Grandinetti, Langellotti & Ting, 2013), rath¬ 
er low-achieving learning background and a modular CLIL approach. The 
hallmark of modular CLIL is a sequence of various CLIL projects spread out 
throughout the school year, each interspersed with mother tongue teaching 
sequences (Krechel, 2005). Therfore, notwithstanding a certain lack of neces¬ 
sary CLIL criteria according to Tedick & Wesely (2015), this setting of non-se¬ 
lectivity of student population and modular input constitutes an “authentic” 
European CLIL context (author 1, 2007; Denman, Tanner & de Graaff, 2013; 
Krechel, 2005). 

CLIL studies repeatedly report visible growth in areas such as recep¬ 
tive and productive vocabulary (Dalton-Puffer, 2011), and CLIL proponents 
have pointed out that such growth can be expected to happen even af¬ 
ter a comparably short time of exposure within an immersive or inciden¬ 
tal-language-learning-approach (Perez-Canado, 2011). This approach has 
had a marked influence on CLIL and foreign language pedagogy (Ellis & 
Shintani, 2013). Llinares & Whittaker (2009, p. 189), for example, maintain 
that "in most courses run by content teachers only, a foreign language is 
only used as a vehicle for learning content with the assumption that this 
will lead the student to learn the language naturally and incidentally”. 
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However, there seem to be three challenges to such a view. First, re¬ 
cent theoretical considerations in second language acquisition (SLA) re¬ 
search shed serious doubts on a simplistic relationship between input and 
language learning. Ortega (2015, p. 259), for example, considers input as 
only one of several ingredients of SLA to be "necessary but not sufficient, 
and perhaps not even the most crucial one”. Second, positive evidence 
for vocabulary growth in CLIL appears to be particular to studies coming 
from environments where access to CLIL depends on school-based selec¬ 
tion procedures, such as language proficiency, parental background and 
school achievement (Bruton, 2011; Dalton-Puffer, 2011; Kiippers & Traut- 
mann, 2013; Rumlich, 2013). Third, there is a growing body of evidence on 
the limited and possibly non-optimal effect of incidental language growth 
in instructed settings in general (Laufer & Nation, 2012; Leow, 2015; Lyster, 
2013; Nation, 2011). 

All in all, vocabulary teaching and learning in CLIL is still strongly 
affected by beliefs in the effectiveness of such a language bath metaphor 
(Huttner, Dalton-Puffer, & Smit, 2013) 1 , even though Vollmer (2010, p. 50) 
states that there is "a paucity of representative and empirically valid stud¬ 
ies concerning the strengths of CLIL students and thus a lack of evidence 
concerning the central assumptions about the benefits and the superiori¬ 
ty of CLIL programmes". This sentiment is also supported by Bonnet & Dal- 
ton-Puffer (2013) and not improved by the even higher scarcity of research 
into CLIL for low-achieving populations (Denman, Tanner & de Graaff, 2013; 
Grandinetti, Langellotti, & Ting, 2013; Schwab, 2013). 

Finally, we need to point out that we are fully aware of the long tra¬ 
dition of bilingual and immersion programmes in many different parts of 
the world (Genesee, Lindholm-Leary, Saunders & Christian, 2006; Tedick 
& Wesely, 2015), but focusing on mostly European CLIL studies is, in our 
opinion, due to important differences between CLIL and immersion pro¬ 
grammes and the particular research aim and context of our study (Dal- 


1 Interestingly, some major CLIL methodology books (Coyle, Hood & Marsh, 2010; Dale & 
Tanner, 2012; Deller & Price, 2007; Mehisto, Frigols & Marsh, 2008) offer sets of activi¬ 
ties and didactic advice going beyond this naturalistic and immersive approach. 
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ton-Puffer, Llinares, Lorenzo, & Nikula, 2014; Lasagabaster & Sierra, 2010). In 
Austria, for example, CLIL teaching is guided by a highly flexible legal con¬ 
text, which allows for schools to set up locally appropriate and tailor-made 
programs. Basically these can range from short, project-based modules, to 
one-year courses in which English is used as a means of instruction for one 
or more subjects. The modular program, as described above, has turned 
into a very popular CLIL approach in lower secondary and primary educa¬ 
tion in Austria Replace with (Gierlinger, 2007). 

To get a solid basis for our research design we reviewed 15 studies 
of quantitatively measured vocabulary growth in European CLIL class¬ 
es. These studies, at first sight, unanimously show advantages in vocabu¬ 
lary growth for CLIL. However, a closer look revealed various caveats, and 
the purported advantages need therefore to be interpreted with caution. 

For a start, since CLIL classes in these studies normally received ex¬ 
tra language support, possible input frequency effects through additional 
exposure need to be taken into account (Dalton-Puffer, 2011; Jimenez Cat¬ 
alan & Ruiz de Zarobe, 2009). Second, reporting on absolute vocabulary 
gain can be misleading. When Pietila & Merikivi (2014), for instance, de¬ 
scribed an advantage of CLIL classes with regard to absolute vocabulary 
growth, they failed to point out that the non-CLIL learners were eventual¬ 
ly rapidly catching up, showing a far better relative gain. Third, similar to 
the above, vocabulary gain may need to be investigated more carefully in 
terms of its relative growth in comparison to control groups. As Mewald, 
Prenner, and Sprenger (2004, p. 12) report, significant differences between 
CLIL and control groups, as assessed in year six, were in fact levelling out 
by year eight. Therefore, their initially proposed Scherenhypothese 2 had 
to be discarded. A similar phenomenon was reported in Admiraal, West- 
hoff, and de Bot (2006), where vocabulary scores ceased to increase after 
four years of CLIL instruction. And fourth, effects attributed to CLIL expo¬ 
sure might actually be mediated by external factors. Sylven (2007), for ex¬ 
ample, pointed out that the CLIL-induced advantages they found, were in 
fact co-determined by extra-mural factors. And finally, there is the ongo- 


2 This hypothesis predicts a widening gap between low and high achievers over a lon¬ 
ger period of time through CLIL exposure. 
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ing methodological issue of finding both an appropriate tool and an ap¬ 
propriate design in order to measure vocabulary growth longitudinally 
(Schmitt, 2010; Doczi & Kormos, 2016). 

In light of the current state of affairs, there appears to prevail a cer¬ 
tain ambiguity of empirical evidence on the superiority of CLIL-induced 
vocabulary learning, as, for example, voiced by Vollmer five years ago. This 
paper tries to address this gap by providing empirical data from a project 
with Austrian lower secondary school students. As already mentioned 
above, CLIL in these schools was carried out in a modular project format, 
in non-selective classes, and through mostly implicit language instruction 
(Ellis, et al., 2009, Gierlinger, 2015). We formulated two research hypothe¬ 
ses for this study. 

After about six months of target language exposure through mod¬ 
ular CLIL, CLIL learners will outperform non-CLIL learners with respect to 
their receptive vocabulary knowledge as measured in relative gains. In oth¬ 
er words, CLIL learners will show a significantly higher receptive vocabu¬ 
lary growth in their post-testings. 

A possible superiority of CLIL-induced receptive vocabulary growth will 
be, apart from the CLIL intervention, co-determined by extra-mural factors. 

METHOD 

Context and design of the study 

A data-driven mixed-methods study was carried out between 2010 and 
2011, combining both naturalistic qualitative and manipulated quantita¬ 
tive classroom data. Following a quasi-experimental non-randomised pre/ 
post-test control-group-design, vocabulary test scores were taken twice 
from all students in order to quantify their vocabulary size before and af¬ 
ter the instructional intervention. The instructional intervention was the 
exposure to CLIL teaching. In our setting, CLIL teaching was exclusively 
done through modular projects. Our CLIL teachers carried out around 5-7 
CLIL projects extending for up to 4 weeks each throughout the school year. 
The overall contact time resulted in either 60 or 80 additional hours of CLIL 
teaching. The different figures are the result of one class being exposed to 
CLIL in two subjects and different project lengths. 
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We are acutely aware that such a research design could be challenged 
because of conflating variables, namely methodology and language input. 
However, CLIL (through English) in a European classroom almost neces¬ 
sarily entails additional exposure to the foreign language (Dalton-Puffer, 
2011). Therefore, the teaching method (CLIL) and this additional exposure 
inevitably constitute conflating variables. Despite this dilemma, Europe¬ 
an CLIL research seems to have accepted this as an intrinsic design prob¬ 
lem. In authentic educational settings, a clear-cut separation of these two 
variables can arguably and regrettably not be modelled as an experimen¬ 
tal condition for English. 

The following table summarises the instructional and learning en¬ 
vironment of all five classes. 


Table 1. Instructional and learning environment 
of the CLIL and non-CLIL classes 



Classes 

CLIL 1 

CLIL 2 

Control 1 

Control 2 

Control 3 

Students 

21 

21 

16 

24 

16 

English classes (hs / week) 

3 

3 

3 

3 

3 

Students' age 

13-14 

13-14 

13-14 

13-14 

13-14 

Course book (CB) 

Subject CB 

Subject CB 

Subject CB 

Subject CB 

Subject CB 

CLIL contact time (hours) 

ca 60 

ca 80 

O 

O 

O 

CLIL subjects 

Geography 

Chemistry 

History 




CLIL-language instruction 

Predominantly implicit; modular projects; bilingual materials 


While the control group had received no extra language input, the 
students from the CLIL group had gone, depending on the number of CLIL 
modules throughout the school year, through either 60 or 80 hours of ex¬ 
tra CLIL class time within the treatment period. This happened as part of 
the schools' language enrichment policy. As far as CLIL methodology was 
concerned, most of the teaching was held in English, and there was hard¬ 
ly any pre-planned and systematic language-focused work. Teachers' 
language interventions were predominantly reserved for quick content 
knowledge clarifications which also resulted in some code switching. The 
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following quote by a CLIL teacher seems to be representative of the lan¬ 
guage teaching policies: "Of course, students have to learn technical terms 
but that is not any focused vocabulary work, it is just the German transla¬ 
tion so that one knows it when one needs it" (Gierlinger, 2015). Students 
were encouraged to speak English, but code switching was not strict¬ 
ly forbidden. CLIL class 1 used English subject course books, CLIL class 2 
worked with English materials provided by the teachers. These materi¬ 
als were marginally but not systematically enhanced, such as providing 
translations or short definitions. The teacher in CLIL class 1 was a lan¬ 
guage and subject specialist, whereas the teachers in CLIL class 2 were 
only subject specialists. 

Participants 

In our study, 87 students from four different Austrian lower secondary 
schools, 45 boys and 42 girls (Mage = 13.79 years, sd = 0.59, range = 12-14.5 
years) took the standardised vocabulary test twice (ti = November 2010, 
t2 = May 2011). Such an interval may appear short for vocabulary growth 
in quasi-immersive settings, but other studies, such as Grandinetti, Lang- 
ellotti, and Ting (2013), worked with even shorter intervals and less input. 
What is more, our control-group-design was geared towards tracing even 
minute growth effects across this comparably brief time-span. 

Our sample consisted of an experimental group (two CLIL classes, 
n = 39) and the control group (three regular classes, n = 48). One class at¬ 
tended two CLIL subjects (chemistry and history). The CLIL students were 
not preselected but formed part of a whole-class and mixed-ability strat¬ 
egy within the school's overall language policy. 

The students’ mother tongues included, apart from German (83%), 
Albanian, Arabic, Mandarin Chinese, Romanian, Serbo-Croat-Bosnian (SCB), 
Tagalog, and Turkish. The CLIL students participated in this study as part 
of their school-wide CLIL enrichment project; the three classes that served 
as the control group were recruited in order to match the experimental 
group for type of school, students' Li, age group, exposure to regular En¬ 
glish classes, their English textbook, as well as the general communicative 
language teaching approach. 
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Materials and procedure 

Three different tools were used for data elicitation. First, the effect of the 
instructional intervention (CLIL teaching) was assessed by measuring vo¬ 
cabulary size before and after exposure. For vocabulary measurement, the 
standardised and computer based vocabulary size test X-Lex The Swansea 
Levels Test (Meara & Milton, 2003) was chosen. X-Lex measures vocabulary 
size by prompting students to rate 120 English words from several vocabu¬ 
lary frequency bands as either known or not known (including nonce-words 
as distractors). From these ratings, a test score is calculated which reflects 
a student's vocabulary breadth. The rationale for this choice was manifold. 
First, X-Lex has already been recommended for vocabulary research in CLIL 
(Canga Alonso, 2013). Second, measuring receptive vocabulary proficiency, as 
tested by X-Lex, is apparently strongly related to word learning by inciden¬ 
tal exposure, which is typical of CLIL environments (Jimenez Catalan & Ruiz 
de Zarobe, 2009, p. 84). Third, X-Lex has been standardised and validated, 
without resorting to one particular norming group, though, for English as a 
second language speakers in a number of studies (Huibregtse, Admiraal & 
Meara, 2002; Mochida & Harrington, 2006). Fourth, it measures vocabulary 
size against pre-defined corpora of different word frequencies, thus tracing 
growth constrained byword frequency. Fifth, students often react positive¬ 
ly to computer-based applications, much more so than to paper-and-pen- 
cil designs. Sixth, from a pragmatic view, the schools only allowed for short 
periods of testing time, which in turn ruled out a more comprehensive as¬ 
sessment tool. Finally, the software automatically produced an output file 
that was easily saved and fed into spreadsheets and statistical software; 
this, in turn, ruled out well-known problems with computerisation proce¬ 
dures and the treatment of missing values. 

In order to examine the lexical variety of teacher input (more than 
11 hours of videoed classroom observation), we ran frequency analyses 
of the teachers’ spoken input using VocabProfile. This software is part of 
the New General Service List and the New Academic Word List (Browne, 
Culligan, & Phillips, 2013) and was adapted for online application by Tom 
Cobb from the Universite du Quebec in Montreal, Canada (Cobb, 2014). 
VocabProfile performs lexical text analyses by dividing a given corpus 
into several categories by frequency. These include fei.the most frequent 
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1.000 words of English, k2, the second most frequent thousand words of 
English, up to £25 (based on the British National Corpus - BNC20 as the 
reference corpus), as well as academic words of English and a residual 
category. It thereby assesses the proportions of low and high frequency 
vocabulary, indicating lexical variety. In addition to that, the software re¬ 
turns standard lexical statistics, such as type-token ratios of the corpus. 
The general reliability of this tool was assessed, among others, in stud¬ 
ies by Meara and Fitzpatrick (2000) as well as Cobb and Horst (2001). In 
our study the teachers’ videoed input was transcribed and then fed as a 
text-file into the software. 

The third methodological tool was a background questionnaire. Com¬ 
plementing the assessment of'vocabulary breadth through X-Iex, all partic¬ 
ipants from the experimental group filled out a background questionnaire 
at the time of the first measurement in a paper-and-pencil fashion. The 
questionnaire explored extra-mural English-related activities along with 
bio-data such as gender, age, family background, school grades in the CLIL 
subjects, and self-assessed proficiency in the four skills listening, reading, 
writing, and speaking. 

RESULTS 

The analyses of the vocabulary test first focussed on the overall as well as 
fei scores (1,000 most frequent words of English) at fi and t2? First, it was 
checked that assumptions of normality within the data were met and that 
there was no over-homogeneity within the variances. Normality was first 
checked through the inspection of O-O-plots. Moreover, both Shapiro-Wilk 
tests and Anderson-Darling tests confirmed that there was no significant 
departure from normality (p-values of both groups and both measure¬ 
ments > .05). Variances across both groups were checked through Bart¬ 
lett’s tests of sphericity (allp-values > .05). 

A first inspection of the two test scores and a visual display shows 
that both groups exhibit remarkable similarities (see Table 2, Figure 1). 


3 The raw data from the vocabulary test and the context questionnaire contained a cou¬ 
ple of missing values due to technical or organizational problems. Thus, the number of 
cases that were going into the analyses varies slightly. 
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Table 2. T-Test Results for the Groups’ overall X-Lex 
Vocabulary Test Scores at ti and t2 



CLIL group 

control group 

95% CI 

M 

SD 

n 

M 

SD 

n 

M 

d 

t 

df 

t-test 

ti 

2758.97 

746.56 

39 

2516.67 

650.80 

48 

-0.09 - 
0.79 

■35 

i-59 

76 


t2 

2842.86 

773.19 

35 

2749.32 

725.11 

44 

-O.33- 

0.58 

■13 

0-55 

71 


Note. M = mean, SD = standard deviation, d = Cohen’s d. 


Figure 1. Boxplots and interaction plot for the X-Lex 
scores at ti and t2 by group 




Figure 2. Boxplots and interaction plot for the X-Lex 
scores at ti and t2 by group 
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What we can see in Table 2 and the two boxplots to the left of Fig¬ 
ure 1 is that the groups’ results do not differ much. CLIL students show the 
highest scores at both fi and ti, as can be seen in the length of the upper 
whiskers in the plot. Both groups show a considerable range and variabil¬ 
ity in test scores (all SDs above 650 points). The proximity of the medians 
in the boxplots and the overlap of the box notches (quasi confidence in- 
tervalls) illustrate that group medians do not seem to differ significantly, 
neither at fi nor at ti. Independent-samples f-tests confirmed that the CLIL 
group had a slightly but not significantly 4 better start at ti, with a small 
standardised effect-size of d = 0.35, and behaved similarly to the control 
group at f2, with a negligible effect size of d = 0.13. 

The interaction plot on the right-hand side of Figure 1 illustrates a 
mild increase in test scores over time for the CLIL group as well as a pro¬ 
nounced increase for the control. A one-way repeated measure ANOVA over 
group interacting with time revealed that there was no significant main ef¬ 
fect for the CLIL treatment [F[ 1,77) = 1.72 ,p = .19), nor for the interaction over 
time (_F(i, 77) = 1.22 ,p = .27). In order to check for a regression effect, the vo¬ 
cabulary scores at 1 2 were modelled using an OLS regression with the test 
scores at fi and the grouping factor as predictors. However, even in such 
a model with the test scores at fi held constant (^(11.9), df= 2,76,p < .001, 
R 2 = 22%), the treatment effect was still insignificant ([3 = 49.38, f(151.67) = 
0.33 ,p = .75) and had a negligible magnitude of r| p 4 = 0.0014. Thus, coming 
back to hypothesis 1, it is not the CLIL exposure over time that predicts the 
vocabulary gain between ti and t2. 

So far, these results are somewhat at odds with predictions made 
by many CLIL proponents. While we can see vocabulary growth in both 
groups, and while the CLIL group outperforms the control group in terms 
of absolute test scores, the relative gain of the control group exceeds the 
CLIL pupils by far. In order to investigate such idiosyncratic behaviour, two 
possible explanations will be explored in the following. The first one re¬ 
lates to frequency effects as well as the interaction of the vocabulary input 
students received and justifiable expectations about vocabulary growth 


4 All p-values in this paper refer to a significance level of .05. 
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based on this specific input.5 5 The second one focuses on extra-mural in¬ 
fluences (Sylven, 2007; Sylven, 2013), which might prove to be co-determin- 
ing factors in our design. 

Frequency effects of vocabulary input 

Frequency is considered a key determinant of language acquisition, and 
higher frequency forms in the input are predicted to enable earlier autom- 
atisation (Milton, 2009; Ortega, 2015). In order to find out if and to which 
extent vocabulary input by the CLIL teachers could have stimulated vocab¬ 
ulary growth in the CLIL group, teachers’ input from three different sub¬ 
jects across 11 hours of class time was submitted to frequency analyses 
using VocabProfile. The video data were first transcribed, computerised, 
and then stripped off all proper names, since those would have skewed the 
true nature of the corpus' size (Milton, 2009). The three clean corpora were 
then fed into the software, which segmented them into frequency bands 
of the first (fei), second (ki), and third thousand (£3) most frequent words, 
academic words (AWL), and a residual category called off-list. The profiler 
produced the frequency counts as illustrated in Table 3. 

The three different subjects (chemistry, geography, history) show a 
surprisingly similar picture. First, there is the relatively high percentage 
(88.11 - 91.45%) of utterances belonging to the 1,000 most frequent English 
words (fei).This and the low Guirard index of 7.54-8.65 indicate the repet¬ 
itive nature of high-frequency words in the CLIL teachers’ classroom lan¬ 
guage (Milton, 2009). While such a prevalence of basic lexis might well be 
pedagogically justified with respect to the entrenchment of English BIOS 
vocabulary (Ellis, 2013), the low proportion of k2 to £3 words suggests that 
more advanced and broader word learning, especially with respect to the 
more closer frequency bands, may not have been fostered through this 
kind of linguistic input in our CLIL groups. 

However, two reservations need to be raised immediately. First, fol¬ 
lowing White (2013), it is still unclear what the optimal ratio of unknown 


5 Note that a test which is considered to be appropriate for assessing receptive vocabu¬ 
lary in EFL as a subject may not be appropriate to assess EFL as a vehicular language 
(Jimenez Catalan & Ruiz de Zarobe, 2009). 
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Table 3. Corpus Analysis of Teachers Vocabulary Input 
from three CLIL Classes 



Chemistry 4h 
class time, 153 
minutes coverage 

Geography 2h 
class time, 92 
minutes coverage 

History 4I1 class 
time, 183 minutes 
coverage 

ki words 

88.11% 

91.45% 

90.60% 

k2 words 

3.95% 

4.69% 

3.91% 

k3 words 

2.62% 

i. 35 % 

1.09% 

AWL words 

1.57% 

0.49% 

0.55% 

off-list words 

3.76% 

2.02% 

3.85% 

tokens 

7981 

3473 

7502 

types 

763 

510 

653 

type-token ratio 

.10 

■15 

.09 

Guiraud index 

8.54 

CO 

cn 

on 

7-54 

tokens per type 

10.46 

6.81 

11.49 

Note, ki = the first 1,000 most frequent words of English, k2 = 1,001 - 2,000 most fre- 

quent words of English, k3 = 2,001-3000 most frequent words of English, AWL = ac¬ 
ademic word list. 


vocabulary items to the word total in a text would be; in other words, how 
many unknown words does a teacher's input need to exhibit in order to 
provide stimulating but comprehensible input? This reasoning does not re¬ 
fer to the well-known debate around the minimum vocabulary knowledge 
required to understand authentic texts in reading comprehension tasks 
(Hu & Nation, 2002; Nation, 2006). While there seems to be a general con¬ 
sensus that this minimal proportion of known words ranges around 95%, 
there is much less agreement as to the minimal proportion of known words 
in teachers’ input in order to be both comprehensible and stimulating. 

Second, a closer look at the off-lists in our research data raises doubts 
as to how and whether the corpus underlying this analysis reflects the de¬ 
sired subject learning in CLIL. For example, various subject-relevant words 
were relegated to this obscure category. In other words, the underlying al¬ 
gorithm VocabProfile employs may not align with the instructional goals 
of CLIL, as a considerable number of the subject-specific words are very im¬ 
portant for a full understanding of the CLIL subject. In this respect, Hyland 
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and Tse (2007) call for more discipline-specific studies of vocabulary use in 
academic setting. Nevertheless, according to teachers' input analysis and 
the power of frequency effects (Ellis, 2013), vocabulary growth should have 
happened at least within the 1,000 most frequent English words, since 
those words featured prominently in the input corpora. 

If we now take those insights from our corpus analysis and re-exam¬ 
ine our quantitative vocabulary scores within the ki band, we get a rather 
different picture. Consider Figure 3 now, which contrasts the overall test 
scores with the ki scores (1,000 most frequent words). 

Figure 3. Interactions plots for mean test scores at both 
measurements for overall (left) and ki (right) results 


Interaction between groups and time for overall test scores 


Interaction between groups and time for KI test scores 




1 2 


While the left panel of Figure 3 illustrates a pronounced increase in 
vocabulary growth for the control group, the right-hand-side panel shows 
that, within the 1,000 most frequent words of English, only the CLIL group 
benefits. When ki test scores at ti were centred (M = o, SD = 1) and con¬ 
trolled for, a regression model over t2 test scores (F( 3,75) = 23.79,]? < -ooi, 
adjusted R 2 = 47%) showed that this CLIL effect over time is significant 
(p = 0.44, t(o.ig) = 2.3 o,p = .024). In other words, CLIL exposure is effective 
among the 1,000 most frequent words when measured against a stan¬ 
dardised and constant fi-value. 

Coming back to hypothesis 2, our results suggest that, within the ki 
vocabulary band, vocabulary development was co-determined by CLIL ex¬ 
posure. Let us now have a look at the influence of background variables as 
determiners for receptive vocabulary growth. 
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The influence of background variables 

As Sylven (2007; 2013) & Pietila and Merikivi (2014) pointed out, extra-mu¬ 
ral factors might co-determine CLIL-induced vocabulary growth. Conse¬ 
quently we also explored possible background variables. These included 
pupils' sex and age, the families’ education level, if they had been on a stay 
abroad or not, their grade in the CLIL subject (geography), their English 
proficiency (amalgamation of school grades and self-assessment in the 
four skills), their English activities outside school as well as the group¬ 
ing factor CLIL vs. control. After these dimensions of our questionnaire 
were pooled using principal component analyses, the vocabulary scores 
at t2 were examined in regression models with eight predictors. After 
all predictors were checked for variance inflating factors and collineari- 
ty, a step-wise linear regression revealed that in the final model [F( 2,69) 
= 13.67 ,p < .001, R 2 (adjusted) = 0.26) only pupils’ English proficiency (p = 
-344.64, f(74.64) = -4.62._p < .001, r| p 2 = 0.24) had a significant and substan¬ 
tial partial effect. The p-coefficient in this model was negative, because a 
lower value for this predictor (fed into the model as the scores from the 
principal component analysis), corresponded to a high proficiency level. 
A complementing classification analysis confirmed that only pupils’ En¬ 
glish proficiency level predicted the second test scores. Coming back to 
hypothesis 2, extra-mural factors outside the CLIL setting did not play a 
significant role in our data. 

DISCUSSION 

Vocabulary growth is one of CLIL’s major language learning driving forces 
(Bonnet & Dalton-Puffer, 2013). Thus, the aim of this study was to revisit 
language growth in CLIL classrooms to find out whether new production 
data match concepts such as frequency effects in teachers’ input (Ellis, 
2013) and extra-mural factors (Sylven, 2007; 2013). 

Let us now discuss the major two findings from our study. First, CLIL 
students fail to outperform the controls in terms of overall receptive vocab¬ 
ulary growth. However, the frequency analyses of teachers’ input revealed 
that CLIL exposure actually centred mainly on the 1,000 most frequent 
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words of English (fa). And this is probably the reason why it was only with¬ 
in this band that we found significant vocabulary growth for CLIL students. 

We can think of two possible explanations for this unorthodox result. 
The first one relates to the power of frequency effects. Since CLIL students 
were vastly more exposed to vocabulary from the fa band, deeper learning 
and entrenchment was to be expected (Ellis, 2013). The reasons for such a 
high occurrence of fa words may lie in the particular pedagogical context 
of CLIL, in which subject-specific content comprehension and clarification 
are considered to be of utmost importance by the teachers (Gierlinger, 2007; 
2015; Hiittner et al., 2013; Llurda & Lasagabaster, 2010; Nikula, 2010). And one 
of the strategies to reach this aim is explaining and elaborating on sub¬ 
ject-specific concepts through basic, high-frequency vocabulary. The high 
type-token ratio plus the high coverage of the fa band in our CLIL teachers' 
input suggests a deliberate effort towards content comprehensibility and 
clarification. Other research by Nation & Webb (2011) points out that by keep¬ 
ing the vocabulary load lower and increasing its repetitions, the amount of 
vocabulary learnt will increase 6 . 

Second, the CLIL specific vocabulary growth may reside more sig¬ 
nificantly in the area of subject specific vocabulary, which was not cov¬ 
ered by the testing tool. However, this raises the question why the use of 
subject specific vocabulary apparently only had a negligible priming ef¬ 
fect (Hoey, Mahlberg, Stubbs, & Teubert, 2007) on academic and general 
vocabulary? In other words, one would have expected a much more pro¬ 
nounced effect between academic and subject specific vocabulary within 
the subject classroom discourses. Arguably, this linguistic puzzle may re¬ 
sult from a more general lack of academic language use at this age level 
and within the context of abroad spectrum of learner achievements. Com¬ 
parative research between the use of academic language in CLIL classes 
and in mainstream classes could shed more light on this issue. 


6 Arguably, such teaching techniques would not be applied as systematically and fre¬ 
quently in a foreign language learning classroom setting. The foreign language cu¬ 
rriculum typically expects an incremental vocabulary progression along pre-defined 
proficiency levels (CEFR, Council of Europe, 2001; Milton, 2009). Or, as Decoo (2010) con¬ 
tended, foreign language learning is a progressive endeavor where the learner should 
advance from one point to another, constantly improving. 
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Third, as suggested by Zydatiss (2012, pp. 27-28), visible receptive vo¬ 
cabulary growth within CLIL may only be expected after a certain critical 
mass of treatment exposure. Thus, a period of 5-6 months of project-based 
exposure of CLIL might simply fail to reach such a critical mass and there¬ 
by prove less effective. This critical mass phenomenon may be further ag¬ 
gravated by an implicit teaching approach, which may have a tendency 
to delay noticing and hence language learning (Svalberg, 2007; Williams, 
2013). A possible language threshold for CLIL is also tentatively pointed out 
by Agustin Llach (2014) in her research on primary CLIL. Summing up, fre¬ 
quency and noticing effects may play a vital role in CLIL vocabualry growth. 

The other main finding of our study pertains to the role of extra-mu¬ 
ral factors. While Sylven (2007) found that extra-mural factors did play a 
significant role in her study, in our data only learners’ proficiency level in 
English predicted the final vocabulary results. Notice, however, that Syl¬ 
ven (2013), in a theoretical article, related her research outcomes to the ex¬ 
traordinary linguistic situation of Sweden, where English, in her words, is 
"omnipresent” (p. 310). 

These findings raise at least three more issues. First, the question 
remains whether richer teacher input may not result in a broader vocab¬ 
ulary gain at least for more advanced learners. Second, would the results 
have turned out to be the same in a less immersive, more instructed and 
vocabulary-focused teaching and learning context? The massive amount 
of recent SLA literature pertaining to the important role of language aware¬ 
ness, the noticing hypothesis, and explicit knowledge for language learn¬ 
ing (Bot, Lowie, & Verspoor, 2006; Ellis, 2015; Ellis & Shintani, 2013; Leow, 
2015; Williams, 2013) suggest that these issues need to be addressed by fu¬ 
ture CLIL research. Third, on a more general level, the overall heterogene¬ 
ity of CLIL contexts and implementations makes it dangerous to jump to 
foregone conclusions, or, as Bonnet and Dalton-Puffer (2013, p. 279) put it, 
"CLIL is itself subject to existing teaching cultures rather than an omnip¬ 
otent agent of systemic change”. 

To sum up, our results remain puzzling but maybe also pioneering 
for the moment. Although we believe, we can trace parts of these issues 
to methodological design problems that come with X-Lex’s - and other vo¬ 
cabulary tools’ - difficulty to deal with subject-essential CLIL vocabulary. 
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Notwithstanding these issues, as implications of our research we pro¬ 
pose some tentative recommendations for further research and CLIL prac¬ 
tice. Given the importance of vocabulary growth in CLIL, there is ample 
room for further research into the development of CLIL-induced receptive 
vocabulary development over time. We believe that researching CLIL’s po¬ 
tential over longer periods, together with a careful description of the meth¬ 
odological instantiations, will reveal a more realistic picture of the effect of 
CLIL on the learning of subject and language content. In addition to this, 
we need more studies on learner and teacher vocabulary with respect to 
frequency and typology (general, academic, and technical) but also its re¬ 
lationships to CLIL methodologies, ranging from (totally) immersive to 
(more) form focussed approaches. Our data suggest that the mainstream 
language bath CLIL metaphor needs to be complemented by more delib¬ 
erate and form-focused instructional approaches (Grandinetti et al., 2013; 
Lyster, 2013; Nation, 2011). 

Finally, our research turned out to be intrinsically complex, because it 
studied the development of a complex phenomenon (vocabulary growth) 
in complex ecologies (classroom learning) among a multi-variant dynam¬ 
ic population (school learners, teachers). Controlling quasi-experimental 
conditions in such a setting appears challenging. Such factors can make 
it extremely difficult to adopt traditionally formulated, linearly framed re¬ 
search methods. By applying a longitudinal and mixed method approach, 
we tried to go beyond a popular but possibly too simplistic comparison 
of CLIL and non- CLIL outcomes only. Although the majority of these com¬ 
parative studies paints a positive picture with respect to language growth 
in CLIL, the results of our study prove to be much less straightforward and 
point towards a complex set of factors influencing language growth in 
CLIL. All in all, the benefits of our study lie in its explorative and critically 
reflective nature. Despite these constraints, we hope that our results prove 
to be sufficiently interesting to merit further investigations. 
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